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Abstract 

The  sum  of  the  largest  k  eigenvalues  of  a  symmetric  matrix  has 
a  well  known  extremal  property  which  was  given  by  Ky  Fan  in  1949. 
We  discuss  a  simple  proof  of  this  property  which  seems  to  have  been 
overlooked  in  the  vast  literature  on  the  subject  and  its  many  general- 
izations. The  key  step  is  the  observation,  which  is  neither  new  nor  well 
known,  that  the  convex  hull  of  the  set  of  projection  matrices  of  rank 
k  is  the  set  of  symmetric  matrices  with  eigenvalues  between  0  and  1 
and  summing  to  it.  The  connection  with  the  well  known  Birkhoff  the- 
orem on  doubly  stochastic  matrices  is  also  discussed.  Our  approach 
provides  a  very  convenient  characterization  for  the  subdifferential  of 
the  eigenvalue  sum,  to  be  described  in  a  subsequent  paper. 

Let  yl  be  an  n  by  n  real  symmetric  matrix,  with  eigenvalues 

Ai  >  •••>  A„ 

and  a  corresponding  orthonormal  set  of  eigenvectors  qi,  . . .,  Qn',  thus 

A  =  Q\Q'^,    Q^Q  =  In, 

where  A  =  Diag(Ai, . . .,  A„)  and  Q  =  [91,...,  9„].    In  1949,  the  following 
theorem  was  proved  by  Ky  Fan  [Fan]: 
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Theorem  1 


E 


A,  =     max    trA'MA'.  (1) 


Here  Ik  is  the  identity  matrix  of  order  k,  and  hence  A'  is  a  matrix  whose 
columns  are  k  orthonormal  vectors  in  3f?".  (AH  matrices  are  assumed  to  be 
real,  but  extension  to  the  case  where  A  is  complex  Hermitian  is  straightfor- 
ward.) 

In  the  case  A;  =  1,  the  theorem  reduces  to  the  Rayleigh  principle;  in  the 
case  k  =  n  \t  states  only  that  the  trace  of  a  square  matrix  is  the  sum  of  its 
eigenvalues.  Cases  I  <  k  <  n  are  natural  generalizations  of  the  Rayleigh 
principle  and  are  also  reminiscent  of  the  Courant- Fischer  theorem  which 
first  appeared  in  1905  [Fis].  (The  Courant  and  Fischer  versions  differ  in 
the  infinite-dimensional  case.)  The  Courant-Fischer  theorem  may  be  stated 
succinctly  as  follows: 

Theorem  2 

Afc  =     max      min  v   X    AX  v. 

For  both  Theorems  1  and  2,  it  is  immediately  clear  that  the  right-hand  side 
is  greater  than  or  equal  to  the  left  by  taking  X  =  [qi,...,qk].  The  proof 
of  the  Courant-Fischer  theorem  is  completed  by  the  following  argument: 
for  any  X ,  take  v  to  be  orthogonal  to  the  first  rows  fc  -  1  of  Q^X;  then 
v^X^AXv  <  Xk-  Completing  the  proof  of  Fan's  theorem  is  somewhat 
harder,  however.  Clearly,  letting  Y  =  Q^ X ,  the  result  is  equivalent  to  the 

following: 

k 
V  -^i  =     max    tr  Y'^AY. 

fr{        YTY=h 

Since  A  is  diagonal, 


UY-'AY^Y.t^'y":  (2) 

j=i .=1 

and  one  might  suppose  the  rest  of  the  proof  to  be  straightforward.  Only  a 
few  lines  of  inequalities  are  in  fact  required,  but  deriving  these  is  not  a  com- 
pletely trivial  exercise;  for  the  details,  see  Fan's  original  proof.  Obtaining 
Theorem  1  as  a  consequence  of  Theorem  2  does  not  seem  to  be  any  easier. 
Another  approach  uses  properties  of  doubly  stochastic  matrices  and  will  be 
described  below.  Therefore,  although  it  is  hard  to  imagine  that  Rayleigh, 
Fischer  or  Courant  would  have  been  surprised  by  Fan's  result,  it  is  quite 


plausible  that  they  were  not  aware  of  it.  Actually,  [MO]  points  out  that 
Theorem  1  is  a  special  case  of  a  much  more  general  but  less  well  known 
result  of  von  Neumann  dating  to  1937  [vN]: 

n 

Ea.T.  =  max  tr  UBVC, 

1=1 
where  cti  >  •  •  •  >  fT„  and  tx  >  •  •  •  >  r„  are  respectively  the  singular  values 
of  the  n  by  n  nonsymmetric  matrices  B  and  C.  See  [MO,  Ch.   20]  for  the 
proof. 

There  is  a  vast  literature  on  various  inequalities  for  sums  and  products  of 
eigenvalues  of  symmetric  matrices;  see  particularly  [BB,  Ch.  2],  [Bel,  Ch.  8], 
[MM,  Part  II],  [Fri]  and  references  therein.  However,  we  are  not  aware  that 
any  of  the  many  results  available  in  the  literature  are  particularly  relevant 
to  the  discussion  here,  except  as  noted  below. 

The  purpose  of  this  note  is  to  describe  an  easy  but  interesting  result 
which  then  leads  to  a  trivial  proof  of  Fan's  theorem.  This  result,  while  not 
new,  is  so  beautifully  simple  that  its  obscurity  is  surprising: 

Theorem  3   Let 

fii  =  {YY'^  :  r^y  =  h] 

and 

^2  =  [W  :   W  =  W'^,  irW  ^k,  Q<W  <I). 

Here  Y  has  dimension  n  by  k,  so  that  YY^  is  a  projection  matrix  of  order 
n  and  rank  k,  and  W  has  dimension  n  by  n,  with  the  last  condition  meaning 
that  W  and  I  -  W  are  both  positive  semi-definite.  Then  fij  ««  ^he  convex 
hull  of  ill,  and  fii  is  the  set  of  extreme  points  of  ^2- 

Theorem  1  then  follows  as  a  consequence,  because 

tr  Y^AY  =  tr  YY^A,  (3) 

and  maximizing  this  linear  function  over  YY'^  6  Ui  is  equivalent  to  maxi- 
mizing it  over  W  £  fl2-  Since 

n 

1=1 
the  fact  that  the  maximum  value  is  YlLi  ^'  follows  trivially  from  the  con- 
ditions on  W.  Equivalently, 
k 
VA,  =  maxtrF^,  (4) 

t=i 


as  F  6  ^2  if  and  only  ii  W  =  Q'^VQ  6  ^2- 

We  derived  Theorem  3  before  we  found  it  in  the  literature.  Our  proof 
is  as  follows.  The  fact  that  any  convex  combination  of  elements  of  fii  lies 
in  ^2  is  immediate.  Also,  using  the  spectral  decomposition  of  W,  which 
has  eigenvalues  lying  between  0  and  1  which  sum  to  k,  it  is  clear  that  any 
element  of  ^2  with  rank  greater  than  k  is  not  an  extreme  point.  The  only 
candidates  for  extreme  points,  then,  are  those  with  rank  k,  i.e.  the  elements 
of  ill.  But  it  is  not  possible  that  some  rank  k  elements  are  extreme  points 
and  others  not,  since  the  definition  of  ^2  does  not  in  any  way  distinguish 
between  different  rank  k  elements.  Since  a  compact  convex  set  must  have 
extreme  points,  and  is  in  fact  the  convex  hull  of  its  extreme  points,  the  proof 
is  complete. 

A  slightly  different  proof  of  this  theorem  was  given  in  1971  by  Fillmore 
and  Williams  [FW],  a  paper  whose  existence  was  recently  brought  to  our 
attention  by  H.  Woerdeman  and  C.-K.  Li.  The  paper  is  primarily  concerned 
with  the  numerical  range  of  a  matrix,  which  in  the  symmetric  case  is  simply 
the  line  segment  [An,  Aj],  together  with  generalizations  of  this  notion,  and 
is  referenced  in  the  subsequent  literature  on  generalized  numerical  ranges, 
e.g.  [GS,Poo].  However,  [FW]  does  not  seem  to  be  well  known  in  the 
general  linear  algebra  community.  We  do  not  know  of  an  explicit  statement 
of  Theorem  3  which  appeared  before  1971.  S.  Friedland  has  pointed  out 
that  the  result  may  be  obtained,  in  fact  in  a  more  general  form,  by  using 
Theorem  4  below  in  conjunction  with  the  classical  technique  of  majorization 
[MO]  and  that  furthermore  the  result  is  related  to  a  1950  theorem  of  Lidskii 
[Kat,p.l45];  however  such  approaches  to  Theorem  3  are  considerably  more 
complicated  than  the  trivial  proof  just  given.  It  seems  very  surprising  that 
Theorem  3  is  so  little  known,  especially  given  its  resemblance  to  the  following 
famous  theorem: 

Theorem  4  Let  il^  be  the  set  of  n  by  n  permutation  matrices,  and  let  Q4 
be  the  set  of  n  by  n  doubly  stochastic  matrices,  i.e.  nonnegative  matrices 
whose  row  sums  and  column  sums  are  one.  Then  Q4  is  the  convex  hull  of 
^3,  and  Q3  is  the  set  of  extreme  points  of  ^4. 

This  theorem  is  usually  attributed  to  Birkhoff  who  discovered  it  in  1946, 
although  Chvatal  [Chv,  Ch.  20]  notes  that  it  was  given  by  Konig  in  1936 
[Kon,  p.  381].  It  has  been  rediscovered,  reproved  in  various  ways,  and 
generalized  by  many  authors;  see  [MO,  Ch.  2]  for  an  extensive  discussion. 

Theorem  4  has  also  been  used  as  the  basis  for  proving  Fan's  theorem, 
e.g.  [RV,  Ch.  6]  and  [MO,  Ch.20].  The  former  reference  proves  Theorem  1 


as  a  consequence  of  some  general  inequalities  proved  using  Theorem  4;  the 
latter  gives  a  more  direct  proof  as  follows.  We  have  already  noted  that  the 
maximization  objective  in  Theorem  1  may  be  written  in  the  forms  (2)  and 
(3);  another  equivalent  form  is 

X^Ze  (5) 

where  e  is  the  /c-dimensional  vector  with  all  elements  equal  to  one,  A  = 
[Ai, . . . ,  An]^,  and  Z  is  the  matrix  of  dimension  nhy  k  whose  elements  are 
defined  by  2,j  =  y^y  Since  Y  has  orthonormal  columns,  the  column  sums 
of  Z  are  one  and  the  row  sums  are  less  than  or  equal  to  one.  Clearly  there 
exists  an  n  by  n  doubly  stochastic  matrix  whose  first  k  columns  are  the 
columns  of  Z  (simply  extend  Y  to  a  square  orthogonal  matrix).  Conse- 
quently, maximizing  (5)  over  permissible  values  of  Z  cannot  give  a  larger 
value  than  maximizing 

X'Sf  (6) 

over  all  n  by  n  doubly  stochastic  matrices  5,  where  /  is  the  vector  with  first 
k  elements  equal  to  one  and  last  n  —  k  elements  equal  to  zero.  By  Theorem  4, 
this  is  equivalent  to  maximizing  (6)  over  the  permutation  matrices,  giving 
an  upper  bound  of  X^,_i  A,  for  the  maximization  objective  and  completing 
the  proof  of  Theorem  1.  (The  last  step  is  actually  slightly  different  in  [MO], 
using  majorization  instead  of  Theorem  4.)  Note,  by  the  way,  that  not  all 
doubly  stochastic  matrices  can  be  obtained  using  such  a  construction  ([MO, 
Ch.  2]).  This  does  not  affect  the  proof  since,  as  noted  at  the  beginning  of 
the  discussion,  the  lower  bound  for  the  maximization  objective  is  immediate. 

The  parallels  and  differences  in  the  two  proofs  of  Theorem  1  given  above 
are  quite  striking.  The  first  proof  used  Theorem  3,  writing  the  maximiza- 
tion over  the  nonconvex  set  il\  and  noting  that  maximizing  instead  over  its 
convex  hull  Q2  led  to  the  desired  conclusion.  The  second  proof  used  The- 
orem 4,  writing  the  maximization  over  the  convex  set  ^4  and  noting  that 
maximizing  instead  over  its  extreme  points  ^3  led  to  the  desired  conclusion. 
(For  a  third  proof,  see  [RW].) 

It  is  a  well  known  consequence  of  Fan's  theorem  that,  because  the  left- 
hand  side  of  (1)  is  the  pointwise  maximum  of  the  linear  functions  on  the 
right-hand  side,  the  sum  of  the  first  k  eigenvalues  of  a  matrix  is  a  convex 
function  of  its  elements.  The  same  property  is  also  deduced  from  (4).  We 
show  in  [OW]  that  formulas  for  the  subdifFerential  of  the  eigenvalue  sum 
may  be  derived  from  either  of  these  two  max  formulations,  but  that  the 
latter  is  particularly  useful,  for  a  reason  related  to  the  discussion  in  the 
preceding  paragraph:  Q2  's  convex,  while  Q3  is  not.  It  follows  from  either 


derivation  that  the  sum  of  the  first  k  eigenvalues  is  a  smooth  function  of 
the  matrix  elements,  i.e.  the  subdifferential  reduces  to  a  gradient,  if  and 
only  if  Xk  >  Xk+i-  Our  interest  in  this  subject  arose  from  consideration 
of  minimizing  sums  of  eigenvalues  of  matrices  which  are  specified  only  in 
part;  see  [CDVV]  for  some  applications  and  [OW]for  further  details.  For  the 
important  special  case  k  =  1,  see  [Ove]. 
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